transformer model

Terms from Artificial Intelligence: humans at the heart of algorithms

A transformer model operates on time series or sequential data using a form of attention, identifiying past states/tokens that are particualrly related to the current token and then using these in particular as part of predicting future tokens. This is like the human ability to hear a sentance such as "The cat sat in the deep orange glow of sunset and licked its fur" -- when you read the word 'licked', your mind instantly pulls out the word 'cat' as related and uses that to make sense of the current pont in the sentance. We use rich semantic structures to perform this, but transformer models use a vector simiilarity between a 'key' and 'query' for each token, where the key models the kind of thing that the input token is, and the query the kind of thing it would like to connect with.

Defined on page 332

Used on pages 332, 341, 464, 540, 573, 582

Also known as transformer networks